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Abstract 

Advanced algorithms are necessary to obtain faster-than-real-time dynamic sim- 
ulations in a number of different physical problems that are characterized by widely 
disparate time scales. Recent advanced dynamic Monte Carlo algorithms that pre- 
serve the dynamics of the model are described. These include the n-fold way al- 
gorithm, the Monte Carlo with Absorbing Markov Chains (MCAMC) algorithm, 
and the Projective Dynamics (PD) algorithm. To demonstrate the use of these 
algorithms, they are applied to some simplified models of dynamic physical sys- 
tems. The models studied include a model for ion motion through a pore such as a 
biological ion channel and the metastable decay of the ferromagnetic Ising model. 
Non-trivial parallelization issues for these dynamic algorithms, which are in the 
class of parallel discrete event simulations, are discussed. Efforts are made to keep 
the article at an elementary level by concentrating on a simple model in each case 
that illustrates the use of the advanced dynamic Monte Carlo algorithm. 



1 



1.0 Introduction 



Classical Monte Carlo methods have been used to study a wide variety of different systems 
since they were first proposed by Metropolis, Rosenbluth, Rosenbluth, Teller, and Teller 
[0. The Monte Carlo method can be viewed as a method to perform mult i- dimensional 
integrals @. For example, to obtain static critical exponents for model systems such 
as Ising, Potts, and classical Heisenberg models, Monte Carlo simulations are used to cal- 
culate multidimensional sums or integrals @, [5| . A significant number of advanced Monte 
Carlo algorithms have been developed for static critical phenomena. These include cluster 
algorithms [6-8], multicanonical algorithms [9-15], and histogram reweighting methods 
[16-18] 

In its simplest form, the Monte Carlo algorithm is like a movie with each frame a 
three-step process. First, generate a (pseudo-)random number. In this work we will only 
use random numbers uniformly distributed between and 1. Second, choose a trial move 
from the current state to a new state. Third, accept or reject this trial move depending on 
the current random number and an acceptance probability for the trial move. The sums 
or integrals to be calculated are then updated using the new frame, and the procedure is 
iterated repeatedly. 

Calculations of static behaviors, such as multi-dimensional integrals or static critical 
exponents, allow a wide variety of trial moves. This includes, for example, cluster and 
multicanonical algorithms. This is so because in static calculations only the final sum 
(or integral) is of interest. The particular trial moves have no physical meaning. Such 
freedom of trial moves no longer exists if the Monte Carlo algorithm is applied to a 
dynamic problem where the physical meaning of the trial moves is an essential part of 
the model. Fortunately, some advanced Monte Carlo algorithms exist to make many of 
these calculations efficient without changing the meaning of the trial moves. It is these 
advanced algorithms that will be described and applied in this article. 

2.0 Standard Dynamic Monte Carlo Methods 

It is also possible to ascribe physical meaning to certain trial moves. Consider a one- 
dimensional random walker on a lattice. At each tick of a clock the walker has a certain 
probability to move to the left, the right, or to stay where it is. In this case, the actual 
Monte Carlo trial (move left, move right, or stay) has a physical meaning. Now imagine 
that the probabilities of moving to the left or to the right are small. Then, most of the 
time the walker stays where it is, and it will require a large amount of computer time 
before the walker moves a substantial distance. The advanced dynamic algorithms for this 
problem, described in Sec. 4, keep this physical dynamic intact, but decrease the amount 
of computer time required for the walker to reach a specified location. 

Consider the ferromagnetic Ising model on a regular lattice with periodic boundary 
conditions. Each site of the lattice has a spin o"j=±l. From the Hamiltonian, the energy 
of a particular spin configuration is given by 

£ = -J^OiOj ~ H^CTi, (1) 

where the coupling constant J>0 and the external magnetic field is H. The first sum of 
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Eq. (0) is over all nearest-neighbor spin pairs and the second sum is over all N Ising spins. 
One can use the simple Monte Carlo procedure described in this section or advanced meth- 
ods such as a cluster method, to obtain the static critical behavior, such as the internal 
energy, specific heat, susceptibility, or order parameter at a particular temperature T and 
field H . A particularly simple trial move involves a single spin flip, i.e. letting crj— <7j. 
However, one can also consider these Monte Carlo trial moves to be due to the interaction 
of the Ising spins with a heat bath. Then the dynamics of the simulation is of physical 
relevance, and methods such as the cluster method cannot be used since they change the 
dynamics. 

The dynamics for the kinetic Ising model can be derived using a Master equation 
approach Jl9|, [20|, |2l|. In 1977 Martin [|^] considered a lattice of quantum spin | particles, 



each coupled to their own heat bath, and showed that under certain assumptions the same 
dynamic as for the kinetic Ising model was obtained. This dynamic is: 

• Randomly choose an Ising spin. If there are N Ising spins, the probability of choosing 
each spin is 1/N. 

• Calculate a random number f. 

• Calculate (or look up in a stored table) the energy E Q \& of the current Ising config- 
uration and E new of the configuration with the chosen Ising spin flipped. 

• Change to the new configuration, i.e. flip the chosen Ising spin, if 

f < exp(-E new /k B T)/ [exp(-E D i d /k B T) + exp(-E new /k B T)] . (2) 

• Increment the time by one Monte Carlo step (mcs), i.e. by one spin-flip attempt. 

This Monte Carlo dynamic has physical meaning. Here T is the temperature and k B is 
Boltzmann's constant. Furthermore, the time scale for a single spin-flip trial is set by the 
heat bath. This means that for Ising spins interacting with phonons, one spin-flip attempt 
per spin (one Monte Carlo step per spin, defined to be one MCSS) should correspond to 
a time on the order of an inverse phonon frequency, roughly 10~ 13 seconds. Thus, to 
simulate this simple model for a time of one second would take 10 13 Monte Carlo steps 
per spin. If you could get your computer to perform one spin-flip attempt for each CPU 
clock tick (roughly 10~ 9 sec), then a one-second simulation would take about three hours 
of CPU time per spin! Your simulation would take 10 4 iV times longer than physical time. 
To simulate metastable decay for the Ising model, as discussed below in Sec. 4.1, one is 
interested in time scales of years to decades. To simulate your Ising model for a iV=100 
spin system would take about 12 days of CPU time to simulate for one second, about 2 
years of CPU time to simulate for one minute, and about one century of CPU time to 
simulate for one hour. These CPU times are much longer than I can wait for computer 
results, and we are interested in simulating much larger lattices and much longer times. 
The sections below describe advanced dynamic algorithms that allow these simulations 
to be performed in a reasonable amount of CPU time without changing the underlying 
dynamic. The goal is to obtain faster-than-real-time dynamic simulations. 

The dynamic algorithm described above, using Eq. (0) for the Ising model, is called 
the Glauber dynamic. It was introduced by Glauber |L9| to study the dynamics of a one- 
dimensional Ising chain. For higher-spin models, and sometimes also for the Ising model, 
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this type of dynamic is called a heat-bath dynamic. Another popular choice @, || for the 
flip probability for Ising spins is the Metropolis dynamic M, which always flips the spin 
if the energy of the new configuration is lower than that of the old. Thus the spin flips if 

f < min {1, exp \(E old - E ncw )/k B T]} . (3) 

The dynamic that should be used in a particular dynamic simulation is the one that has 
been derived from the underlying physical system. Both of these dynamics satisfy detailed 
balance, and hence will obtain the correct statics for the Ising model. 

Note that the dynamic involves a random choice of the spin on which a spin flip will 
be attempted. Performing spin-flip attempts sequentially on the lattice will lead to a 
different dynamic. It has been shown in nucleation-and-growth studies of the lifetime of 
the metastable state in an Ising model that the use of sequential instead of random updates 
can lead to different physical results. In particular, the pre-factor for the nucleation rate 
is different [O. 



3.0 A Brief Review of Absorbing Markov Chains 

Many of the advanced algorithms described in the following sections will use absorbing 
Markov chains. A brief introduction to some of the important features of absorbing 
Markov chains is presented here. 

Consider an absorbing Markov chain with s transient states and r absorbing states. 
The system starts in one of the s transient states, and remains in the transient subspace 
of the s transient states until it is absorbed into one of the r absorbing states. We will use 
the mathematician's notation of the current state vector being operated on by a matrix 
from the right. (As opposed to the normal physicist's convention.) This will allow the 
reader to make easy contact with the literature on absorbing Markov chains, for example 

The transition matrix defining an absorbing Markov chain has the form 

M (r+s)x(r , +s) = f^ xr ^ xs \ (4) 

where the sizes of the matrices have been explicitly shown. The matrix I is an identity 
matrix, the matrix is a matrix with all zero elements, the matrix T is called the transient 
matrix, and the matrix R is called the recurrent matrix. M is a Markov matrix, i.e., each 
of its row sums equal one, since at each clock tick the system must jump to a new state 
or remain in the current state. 

The matrix that governs the motion of the system after m time steps (or m clock 
ticks) is given by matrix multiplication to be 

M m = ( ^-rxr Orxs \ /r\ 

ivi (r+5)x(r+s) ^ I + T + ... + Tm -l) sxsRsxr T mJ • W 

The system must initially be in the transient subspace, so the s-state initial vector is iff 
and the total initial vector for the matrix M is given by ( T iff ) where the vector of 
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zeros has r elements. (The superscript T denotes the transponse.) Applying M m to this 
initial vector gives the (r+s)-dimensional vector 

(0* n[)M m = (v?(I + T + --- + T m - 1 )R v?T m ) . (6) 

Introduce the vector e with all elements equal to unity. Then the probability of still being 
in the space of the s transient states after m time steps is given by 

Pstill in transient subspace Uj T 6 . (7) 

The probability of the state having exited to each of the r absorbing states after m time 
steps is given by the elements of the vector 

^absorption after m time steps = V l (l + T + • • • + T j R . (8) 

The probability of having exited to each of the r absorbing states after m time steps is 
the sum of the probabilities of having exited in the first time step (which is the term 
IR) plus the probability of having exited after the second time step (the term TR) plus 
the probability of having exited in time steps 3, 4, • • -, m. Thus the probability that the 
system exits to each of the r absorbing states, given that it exits at time step m, is given 
by the elements of the vector 

-r _ 7= fT^R 

^absorption given exit at time m Q v r ^T m ~^J\,e ' 

Equations (|7]) and (Q) are the equations needed to perform Monte Carlo with Absorbing 
Markov Chain calculations. 

In the absorbing Markov chain literature the sxs matrix 

N = (I - TT 1 = I + T + T 2 + ■ • • , (10) 

called the fundamental matrix, is introduced. Then the probability that the system ends 
up in a particular absorbing state is given by the elements of the vector 

^absorbed into particular state ^1 NR • (H) 

The average lifetime to exit from the transient subspace is given by 

( T ) = vjNe. (12) 

The units for (r) are the time units of the clock, with one attempted move every tick of 
the clock. This can be derived using the fact that the probability to be absorbed at time 
step m is 

^absorbed at time m = V I ^T" 1 1 — T"^ e , (13) 

so that 



(r) = ^^(T- 1 



m = l 



(I + T 1 + T 2 + • 

^(I-T) _1 e = vjNe. (14) 



5 



The fundamental matrix can also be used to calculate the moments of the lifetime for the 
system, that is the moments of the exit time probabilities. These are given by J25 



( T 2 ) = vj (2N 2 - N) e (15) 

and 

(t^ = vj (6N 3 - 6N 2 + N) e (16) 

and 

^r 4 ) = vj (24N 4 - 36N 3 + 14N 2 - N) e . (17) 

4.0 Particle Motion Through a Pore 

In order to illustrate the advanced dynamic Monte Carlo algorithms, we introduce an 
extremely simple one- dimensional walk. This one-dimensional walk can be viewed as a 
first step in simulating motion of ions through biological ionic channels [27-32]. These ion 
channels control the motion of ions through cell membranes, and they are consequently of 
importance in understanding cell function and drug delivery to cells. The model is kept 
simple to allow us to introduce the dynamic Monte Carlo algorithms in a transparent 
fashion. 

The model is a one- dimensional walk on a lattice. Our particle (the ion) starts at 
lattice site zero (on the left) and performs a random walk until it is absorbed at the 
right-most lattice site. Each lattice site i has an energy We will study a 16-site lattice 
(16 transient states and one absorbing state) with energies shown in Fig. [1|. The energies 
for lattice sites through 16 are (in dimensionless units) — |, 1, 1, 0, |, 0, 0, ~, 0, 0, |, |, 
|, 0, |, 0, — 1. The dynamic we introduce is that at each time step the walker randomly 
picks whether it will try to perform a move to the left or to the right. The probability of 
actually performing this move is given by 

= exp(-E i±1 /k B T) ^ ig ^ 

exp(-Ei/k B T) + exp(-E i±1 /k B T) 

where the + (— ) sign is for the attempted move to the right (left). When the walker is in 
lattice site 0, the move to the right is always attempted. We are interested in the average 
lifetime (r) for a walker starting at lattice site before it is absorbed at lattice site 16. 

4.1 Ion Channel Standard Monte Carlo 

The standard Monte Carlo dynamic for this problem is extremely simple to program. 
Each algorithmic step uses two uniformly distributed random numbers, r and f. The 
algorithm is: 

• Place the walker at lattice site 0. 

• Set the time t to zero. 

• Perform the following steps until the walker is absorbed when it reaches lattice 
site 16. 
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o Set the time t to t—t+1. 

o Calculate two random numbers, r and f. 

o If the walker is not at lattice site and f is less than |, then attempt a move 
to the left. Otherwise attempt a move to the right. 

o Calculate (or look up in a stored table) the probability of moving, p move from 
Eq. qnj). 

o Move the walker if f<p move . 

This algorithm is executed K times, and the statistics for the mean lifetime are accumu- 
lated. The average lifetime, calculated for if=1000 escapes, is shown in Fig. |2|. The time 
unit is Monte Carlo steps (mcs), so the lifetime corresponds to the average number of 
clock ticks before the walker reaches lattice site 16. 



4.2 Ion Channel n-fold Way Monte Carlo (Event Driven Simula- 
tion) 

As can be seen from Fig. |^, the average lifetime at low temperatures can be very long. The 
time unit is Monte Carlo steps, mcs, for our single walker. At T=0.05 (in dimensionless 
units with /cb=1) the value for r is about 3xl0 13 mcs. If each mcs took a nanosecond of 
computer time (a very fast implementation on an extremely fast computer), the compu- 
tational time required for 1000 escapes would require 3xl0 7 seconds of CPU time. This 
corresponds to about one year of computer time! (For my implementation of the stan- 
dard algorithm on a Cray SV1 supercomputer this calculation would have taken about 
32 years!) I did not spend this amount of computer time on the calculation. Rather, I 
utilized an advanced algorithm to perform my dynamic simulation at this temperature in 
35 minutes on a Cray SV1 computer. 

The first advanced algorithm I used corresponds to the n-fold way algorithm intro- 



duced by Bortz, Kalos, and Lebowitz in 1975 [fffil , as implemented in discrete time [ j34fl . 
This and related similar schemes have been put forward and utilized in the simulations of 
many problems. Just a few of these include crystal growth [BR BJ1 , renormalization group 



methods for obtaining critical exponents [BJJ, growth of nanostructures [38], [39|, dynam 



ics of simple models of glasses [f40| , j4l|, lattice models for protein folding fl42"| , thermal 

catalytic surface 



decay of recorded information |43 
reactions 
models 



Potts models for domain growth [|4 



aging on long time scales E7 



48| , crossover phenomena in the anisotropic Ising model E9 



ground state calculations of spin glass 
domain boundaries 



in anisotropic Ising models []50| |, multidimensional minimization of functions via simulated 
dynamic recrystallization 



annealing [51 



52J, and collective surface diffusion [53 



These algorithms are also called event-driven algorithms in the discrete-event simu- 
lation community A discrete-event simulation is one in which the state vector of 
the system changes discontinuously at particular (possibly random) time intervals. In 
this case, the time intervals correspond to the times at which the walker jumps from one 
state to another {not the times at which a step is attempted but then rejected). The idea 
behind event-driven simulations is to change the state of the system in one algorithmic 
step, and to increment the time by the time required to make that transition. This is 
where the absorbing Markov chain methodology is useful. 
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Let the transient state for an absorbing Markov chain be the current state, and the 
two absorbing states be the walk to the left or to the right. Then the Markov matrix is 




M ion i=0 1 (19) 



ln(r) 



In(l-Pleft-Pright) 



1. 



where p r i g ht is the probability of jumping to the right and p\ e f t is the probability of jumping 
to the left. These probabilities are given by p mov c/2 [from Eq. (|l^)] where the factor of 
1/2 is needed because a move attempt to the left or right is chosen at random. If the 
walker is at lattice site 0, then pi e ft=0 and p r ight=Pmove- Since the walker is in the current 
state, the initial state vector is vf = ( 1 ). 

Each algorithmic step again uses two uniformly distributed random numbers, f and 
f. The algorithm to repeat for K escapes is: 

• Place the walker at lattice site 0. 

• Set the time clock to t=0. 

• Perform the following steps until the walker is absorbed at lattice site 16. 

o Calculate two random numbers, f and f. 

o Calculate (or look up from tables) the two probabilities p r i g ht and pM t - 

o Calculate the time m to exit from the current state m 

where \_z\ is the integer part of z. 
o Move the walker to the right if r< — — . Otherwise move the walker to the 

° — Pright+Pleft 

left. 

o Increment the time to t+m. 

In a primitive way, if the probability to move is very small, say 0.001 per time step, then 
instead of increasing the time by one unit for each of the mostly unsuccessful attempts to 
move, we always move the particle and increase the time by on average 1000 units, thus 
saving computing effort by a factor 1000. 

A number of remarks are required. The number of time steps m to exit the current 
state is given by Eq. fl?|). For this simple case where there is only one transient state 
(s=l), the matrix T=Tn=l— pi c ft— bright is a scalar, and Eq. (|7|) gives the time of exit 
from the transient state to be T™<f<T 1 7 _1 . This reduces to the equation for m in terms 
of the natural logarithms. Also, since T is a scalar, there is no m dependence on the 
probabilities to exit to the left or right state since from Eq. (Q) the factor of T^~ l cancels 
due to the normalization. 

Using the event-driven simulation, the time to measure 7^=1000 escapes of the al- 
gorithm at T=0.05 would be 10 3 minutes, or about 17 hours of Cray SV1 time. For 
this temperature the event-driven simulation is about one million times faster than the 
standard Monte Carlo algorithm! 
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4.3 Ion Channel s=2 MCAMC 



At low temperatures, even the event-driven simulation can take a large amount of com- 
puter time. This is principally due to two portions of the energy landscape, namely the 
lattice site pairs (5,6) and (8,9) in Fig. [I]. In these situations it is advantageous to use 
more than one state in the transient subspace. We will describe the case of two transient 
states, so a s=2 Monte Carlo with Absorbing Markov Chains (MCAMC) will be used 
whenever a walker steps into the pair (5,6) or the pair (8,9) in Fig. |l| At low tempera- 
ture the probability of stepping out of this pair is exponentially small compared to the 
probability of stepping to the other site of the pair. Consequently, the walker performs 
many steps within the pair before it eventually steps out of the pair of lattice sites. We 
have chosen the energies of these lattice pairs to make the s=2 MCAMC algorithm easy 
to describe. The general case can easily be constructed using the absorbing Markov chain 
methodology of Sec. 3.0. In our case, the sites of the pair (5,6) or (8,9) have the same 
energy (chosen to be zero). Also, the bordering lattice sites 4 and 7 (or 7 and 10) have 
the same energy. We will use the s=l MCAMC, the event-driven simulation of Sec. 4.2, 
unless the walker has just stepped onto a lattice site pair (5,6) or (8,9). Let x be the 
probability of stepping from site 5 to 4 in one time step, so 



1 

x — 

2 exp 



exp(-E 4 /k B T) 

-E 5 /k B T) + exp{-E 4 /k B T) 



(20) 



The probabilities of stepping from 6 to 7 or from 8 to 7 or from 9 to 10 in one time step 
are also equal to x. The probability of stepping from 5 to 6 in one time step is which 
is also the probability of stepping from 6 to 5, 8 to 9, or 9 to 8. 
The Markov matrix for this s=2 transition is 



Mi 



ion 2 



fl 














1 








X 

\o 




X 


\-x-\ 
i 
i 


1 

4 

1 — X 



\ 



(21) 



In our case we have 



1 



x 
i 
i 



l 

4 
X 



[1 



X 



Thus the time m to exit from the pair of lattice sites is given by 

ln(f) 



m 



ln(l — x) 



+ 1 . 



(22) 



(23) 



To calculate the probability of exiting to either side from a lattice site pair, for the 
normalization use 

/ 1 \ 

\m— 1 



7 T T m-l 



(5) = (i-^ 

Also use the spectral decomposition of T 2 , which is 



im-l 



( ] ) ( 1 1) + 



1 /l 



2 V2 



— x 



m—l 



-1) • 



(24) 



(25) 
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This gives from Eq. ([|) that the probability of exiting the lattice pair back to the same 
lattice point the walker entered from is 



1 

Pcxit back 



/ \ \ m— 1 

1 + 



2 X 



X 



(26) 



and the probability of exiting to the lattice site opposite to the site it entered the lattice 
pair from is 



1 

Pexit other ^ 



1 \ m—1 

2 ~ X \ 



(27) 



\1 — x J 

The algorithm to use s=2 MCAMC for these two lattice pairs is: 

• Place the walker at lattice site 0. 

• Set the time clock to t=0. 

• Perform the following steps until the walker is absorbed at lattice site 16. 

o Calculate two random numbers, f and f. 

o If the walker has not just stepped onto lattice site 5, 6, 8, or 9 then use the 
rejection-free (s=l MCAMC) algorithm described in Sec. 4.2: 

□ Calculate (or look up from tables) the two probabilities p r i gri t and p\ e f t . 



□ Calculate the time m to exit from the current state m- 



ln(f) 



ln(l-pi eft -p rig ht) 



+ 1. 



□ Move the walker to the right if r< — — , otherwise move the walker to 

° — Pright+Pleft ' 

the left. 

o If the walker has just stepped onto lattice site 5, 6, 8, or 9 then use s=2 
MCAMC: 

□ Calculate (or look up from tables) the probability x from Eq. (p0[). 

□ Calculate the time m to exit the pair of lattice points using Eq. ([2~3|). 

□ Calculate (or look up from tables) p exit back from Eq. (|26|) . 

□ If p ex it back<^ then move the walker to the site it entered the lattice pair 
(5,6) or (8,9) from. Otherwise move the walker to the other exit lattice 
site of the pair. 

o Increment the time to t+m. 

Using this s=2 MCAMC algorithm at T=0.05 for 7^=1000 escapes takes about 0.35 
minutes of CPU time. This is about 5000 times less than the s=l MCAMC (n-fold way) 
algorithm, and about 10 10 times faster than the traditional Monte Carlo algorithm! 

The timings for all three algorithms are shown in Fig. |^. The timing for the standard 
Monte Carlo algorithm is proportional to the average lifetime shown in Fig. |2|. Figure 0(b) 
shows the crossovers between the timings for the various algorithms. The s=2 MCAMC 
turns out to be always at least as fast as the s=l MCAMC. This is not the usual case, 
and is due to the very simple form for T 2 . The standard Monte Carlo is faster at high 
temperatures, with the s=2 MCAMC being faster at temperatures below about T=\. 
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4.4 Ion Channel Projective Dynamics 



This simple model can actually be solved analytically by noting that the transient matrix 
is tridiagonal and has the form 



16 



/l- 


s 15 — 9l5 


$15 











° \ 




914 1 " 


~ s 14 — 9l4 


S 14 




































92 


1 - s 2 - g 2 


s 2 
















9i 


1 - si - gi 


Si 


V 














9o 


i-gJ 



(2f 



where is the single-time-step transition probability from i to i+1 (the 'growning prob- 
ability') and Si is the single-time-step transition probability from i to i— 1 (the shrinking 
probability). In other words, we have changed notation slightly and g$ are the appropriate 
Pright and Si are the appropriate pi e f t . The walker starts in state 0, so the initial vector is 







1 ) . The recurrent matrix is given by 



R 



16 



(9is\ 




V o J 



(29) 



since the walker is absorbed when it steps onto state 16 from state 15. 

Introduce the average time h(i) spent in state % during this absorbing Markov process. 
Then the average lifetime is the sum of these residence times, 



(r) 



15 
i=0 



(30) 



An iterative solution for the h(i) terms can be found by considering the number of times 
during K escapes that a walker passes an imaginary midpoint between state i and %—\. 
Since there are K escapes and each escape must pass through each mid-point, there will 
be K more walks to lattice site i than to lattice site %—\. During K walks the average 
time spent in state i is h(i) K. Thus 



g l -ih(i — 1)K — K + Si h(i)K . 
For site 16 there is no shrinking probability and gi§h(Voi)K = K or 

h(15) = — . 

915 



Solving for the other h(i) values gives 

h{i - 1) 



1 + Sih(i) 
9i-i 



(31) 



(32) 



(33) 



Thus we have a closed form for the lifetime for this simple process. This closed form is 
shown as the solid line in Fig. |^. Note that the time unit is clock ticks, i.e. mcs (Monte 
Carlo trial steps). 
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The model we used for the ion channel has only one entrance of the ion (from the left). 
A more realistic ion channel model has the possibility of entering and exiting from both 



ends of the channel |28| . For our simple model this only means that the state with no ions 
in the channel needs to be included together with the possibility of entering either end 
of the channel from this state and being absorbed by exiting from either end. Again, the 
absorbing Markov chain is tridiagonal. It now has two absorbing states. In addition to r, 
other quantities of interest include the average time to exit one end, given that it entered 
from a particular end. These times can again be calculated using the methodology of 
absorbing Markov chains. 

In the sections below, we will utilize this formulation in the projective dynamics 
method for the escape of more complicated models from the metastable state [|55f| . In 
the majority of situations there is no analytic expression, but the same formulation can 
be used, see Sec. 5.4. 



5.0 Dynamic Ising Simulations 

The simple model for ion channels introduced above illustrates many of the advanced 
Monte Carlo methods, but it is too simple to completely illustrate their use. We now 
describe advanced dynamic Monte Carlo simulations for a more complicated but still 
simple model, the square-lattice Ising ferromagnet described in Sec. 2.0. This model has 
N Ising spins o which can be either up (+1) or down (—1) at each lattice site. We will 
consider the model with periodic boundary conditions on an LxL lattice, so N=L 2 . We 
again are interested in the average lifetime (r) of the metastable state. The time units 
that correspond to physical time are Monte Carlo steps per spin, MCSS, as described in 
Sec. 2.0. The system starts with all spins up, and has an applied magnetic field H which 
is down (H<0). We consider only ferromagnetic nearest-neighbor interactions of strength 
J. The energy is given by Eq. (|l|). The total magnetization of the system is 

L 2 

M = £> i=1 . (34) 

i 

This model has a critical temperature fee 7c = 2J/ln(l + y / 2)~2.269 • • • J. We want to 
measure the average lifetime (r) for T<T C for the system to progress from the initial state 
with all spins up to a state with an equal number of spins up and down, i.e. to M=0. In 
this way we can study the dynamics of the model below the critical temperature, where 



the metastable decay of the model is dominated by nucleation and growth |20], |21], [56 . 

Even for this simple model the understanding of (r) is complicated. Two recent 
reviews of nucleation and growth analysis of metastable decay in the Ising model are 
presented in [El, |5"7|] . A review of earlier works on metastable decay is given in Ref. [J53]. 
More complicated Ising models, including other boundary conditions |59[, heterogeneous 



nucleation ||60|| , and an approximate demagnetizing field ]6T| have been studied as well. 



The Ising model with periodic boundary conditions has four length scales [^3| |24|. The 
lattice spacing a and the lattice size L are two of them. Another length scale is the 
radius of the critical droplet, R C (T,H), which is a function of temperature and applied 
field. Droplets with radii smaller than R c are subcritical and will most likely shrink, 
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while droplets with radii larger than R c are supercritical and will most likely grow. The 
remaining length scale has been called Ro(T,H) p3| , and it also depends on T and 
the strength of the applied field. This can be regarded as the average size to which a 
supercritical droplet will grow before it encounters another supercritical droplet. Note 
that Rq>R c always. 

These four length scales give rise to different types of decay mechanisms for the 
metastable state, depending on their inter-relationship. Only if R c and Ro are larger 
than a is the droplet nucleation picture applicable, so when R c <a the system is in a 
strong-field (SF) regime where there is no long-lived metastable state. For weaker fields 
and for large lattices, a-cR c <_R <^L. Then many supercritical droplets form during the 
decay to M—0. This is called the multi-droplet (MD) regime, and it is well described by 
the well-known Kolmogorov, Johnson- Mehl, Avrami theory [|62| , |63fl . For smaller lattices, 



where a<i? c <CL-C_R , only one supercritical droplet forms and grows to take the system 
to M—0, so this regime is called the single- droplet (SD) regime. For the square-lattice 
Ising model the crossovers between these various regimes are shown in Fig. [| for different 
values of L. Another regime, where L<R C is called the coexistence regime. It is not shown 
in the figure. 

The cross-overs between the different decay regimes and how they depend on H and 
T are of central importance |23|, ¥ZM. A conservative estimate for the cross-over between 



the SF and MD regime is where R c =a/2. This is shown as the heavy dashed curve in 
Fig. f|, and is independent of L. The cross-over between the MD and SD regime can 
be estimated by noting that the standard deviation of the lifetime, Ar=^ (r 2 ) — (r) 2 , 
approximately equals (r) in the SD regime and is much smaller than r in the MD regime. 
One consequence of this is that the measurement of (r) is self-averaging in the MD regime, 
but is very far from self-averaging in the SD regime. Hence, to measure (r) in the MD 
regime only a few realizations are required, and the 7^=1000 escapes used here are more 
than adequate. Conversely, in the SD regime the 7^=1000 escapes we average over are 
barely sufficient since the distribution of r has an extremely long tail. The estimate for 
the cross-over between the MD and SD regimes, called the dynamic spinodal [^, is 
given by the point where Ar=(r)/2. The location of the dynamic spinodal depends on 
L, and is shown for various values of L in Fig. ^. 

For low temperatures and \H\>2J, a single overturned spin is the nucleating droplet 
|65|j with an energy barrier given by T(H, J)=8J—2\H\. The dynamic spinodal (DSp) for 
the random-update dynamic in Sec. 5.1 can be found |)6| by equating the average growth 
time to the average nucleation time to give 

(i^> = ?l„(L)-0.82, (35) 

where the last term is the only fitting parameter and was chosen to provide agreement 
for values of L between 8 and 240. The factor of | changes if the dynamic changes, 
for example it is 1 for sequential updates pBfl . For a fixed field the dynamic spinodal 
approaches the T=0 

fcsTDSp = f ML) '-0.82 ' (36) 
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and without adequate finite-size scaling such a slow approach toward T=0 as a function 
of L could easily be misinterpreted as a phase transition. 

For low temperatures, as seen in Fig. |], the single-droplet regime is dominant. However 
for strong fields the discreteness of the lattice becomes important and the nucleating 
droplets are no longer circular. For arbitrarily low temperatures and \H\<2J with a 
fixed lattice size, the metastable decay is dominated by the SD regime, and the predicted 
lifetime is given by ||65| 



where 



fc B Tln((r low » = F{H, J) = 8J£ C - 2\H\ 



2J 



+ D 



\H\ 



(37) 



(3* 



and \x\ denotes the smallest integer not less than x. The nucleating droplet is a cluster 
of overturned spins of the stable phase which is a £ c x(£ c —l) rectangle with one additional 
overturned spin on one of the longest sides of the rectangle 
prefactor A that enters the average lifetime [p7[] as 



65 . There is a non-trivial 



(riow sd) = -4sd exp [T(H, J)/k B T] 



(39) 



The prefactor v4sd=5/4 for £ c —l and Asd=3/8 for £ c =2 [j67| . Closed forms for Asd for 
larger values of £ c have not yet been calculated. 

The three limits, L—>oo, T— >0, and \H\— >0 do not commute for the decay of the 
metastable state for the Ising model. Taking first L^oo places the system always in the 
MD regime for any finite temperature [see Fig. [| and Eq. (|36f)1. In this case the average 
lifetime is given by p8 



(now md) = ^MDexp 



T(H, J) + (U - 2\H\) 
3/crT 



(40) 



whenever £ C >1. The factor of 3 (which is d+1 in d dimensions) in the denominator is due 
to the multi-droplet nature of the decay and is the same as that for the MD regime for 
circular droplets J62], [63| . The additional term is due to the fact that at low temperatures 
for \H\<2J the growth of supercritical droplets is dominated by nucleation events on the 
surface of these droplets |68[ . 



The advanced algorithms to be described below perform differently in the different 
decay regimes. 



5.1 Standard Monte Carlo for the Ising Model 

The Ising energies of Eq. ([[]) are sufficient to obtain the statics of the model, such as the 
specific heat or magnetic susceptibility. However, this model does not have any dynamic 
since it is a classical model. A class of dynamics for this model can be derived using a 



Master equation approach |19], |20|, plU - Martin in 1977 [0] showed that it is possible to 
start with quantum spin | particles each coupled to their own particular heat bath, and 
to obtain the dynamic described in Sec. 2.0. 
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Note that there are an infinite number of possible dynamics that could obtain the 
correct statics with the energy of Eq. ([I]). For example, it is possible to use a Metropolis 
spin flip probability Jl], [|] given by Eq. (|3]) rather than the Glauber spin flip probability 
Eq. (H). It would also be possible to choose the spin in a particular order, perhaps 
sequentially. However, using one of these dynamics would lead to different values of (r) 



compared with the physically justified dynamic. For example, it has been shown |23], |24 
that the power of the H prefactor of the nucleation rate differs between the random 
update and sequential update dynamics, and the random update dynamic has an H 
prefactor consistent with the value expected from field-theory arguments |69|, [HJ. Thus 
the physically motivated dynamic of Sec. 2.0 should be used in cases where the dynamics 
is given a physical meaning. Furthermore, some of the advanced algorithms such as the 
n-fold way and MCAMC algorithms work only for the random update dynamic — or 
would have to be modified significantly for the sequential update dynamic. 

The standard dynamic Monte Carlo for metastable decay of the Ising model is: 

• Start the lattice with all N spins up. 

• Set the time t to zero. 

• Perform the following steps until the system reaches a magnetization M=0: 

o Increment the time t to t+1. 

o Calculate two random numbers, f and f . 

o Randomly choose one of the N spins using f. 

o Calculate (or look up) the current energy E oid and the new energy E new if the 
chosen spin were to flip. 

o Calculate (or look up from a table) the Glauber flip probability. 

o Use Eq. (|2]) with f and either flip the spin or retain the old spin configuration. 

This algorithm is executed K times, and the statistics for the mean lifetime are accumu- 
lated. A single algorithmic step is a Monte Carlo step (mcs), while the time that should 
be proportional to the physical time is a Monte Carlo step per spin (MCSS). The average 
lifetime, calculated for i^=1000 escapes, is shown for two different fields in Fig. ||(a) and 
Fig. 1. 



5.2 n-fold Way for the Ising Model 

The n-fold way algorithm introduced by Bortz, Kalos, and Lebowitz uses spin classes 
for the Ising model. For the isotropic square-lattice Ising model with periodic boundary 
conditions in finite field, every spin belongs to one of n=10 different classes. These classes 
are organized by the spin orientation (up or down) and the number of nearest-neighbor 
spins that are up (which can be 0, 1, 2, 3, or 4). The energy of every spin in a particular 
class is the same, so the probability of flipping any spin in a given class is the same. The 
10 spin classes are listed in Table 1. Note that for escape from the metastable state we 
start with all spins up so all spins are initially in class 1. When an up (down) spin flips, it 
changes its spin class by +5 (—5). Furthermore, when an up (down) spin flips, all four of 
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Class 


Spin 


Number of 


—E \d 


-E 


Number 


Orientation 


nearest-neighb or 
spins up 






1 


T 


4 


AJ + H 


-AJ-H 


2 


T 


3 


2J + H 


-23 -H 


3 


T 


2 


+H 


-H 


4 


T 


1 


-2J + H 


2J-H 


5 


T 





-AJ + H 


AJ-H 


6 


I 


4 


-4J-H 


AJ + H 


7 


1 


3 


-2J-H 


2J + H 


8 


i 


2 


-H 


+H 


9 


i 


1 


2J-H 


-2J + H 


10 


i 





AJ-H 


-AJ + H 



Table 1: The 10 classes of Ising spins for the square-lattice nearest-neighbor model. 



its nearest-neighbor spins change their spin classes by +1 (—1). Also shown in Table 1 are 
the new and old local energies of the spins — remember J>0 while H<0 for our escape 
from the metastable state. 

Let rii be the number of spins in class i. One always has 
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N = (41) 



i=i 



Let pi be the probability of flipping a spin in class % using the dynamic such as the Glauber 
dynamic of Eq. given that the spin was chosen during the random choice portion of 
the dynamic. Then the probability in one time step of flipping any spin in class % is 
riiPi/N since rii/N is the probability of randomly choosing a spin in class i and Pi is the 
flip probability once a spin in class % has been chosen. In order to exit from the current 
spin configuration, a spin in one of the 10 classes must flip. Consequently, the absorbing 
Markov chain has one transient state and 10 absorbing states. In general, the n-fold way 
has n classes for a particular discrete spin model, and there are n absorbing states and 
one transient state, the current spin configuration. 



The original n-fold way algorithm was written in continuous time [33|. It was recast 



in a form for discrete time in 1995 [34|. We will use the discrete-time version because it 



allows for a much easier generalization to more than one transient state. Introduce 

1 * 

Qi = m n iPi l<i<n (42) 
and define Qo=0. The absorbing Markov matrix for exiting from the current spin config- 
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uration is 



M n+1 = — 



1 

N 
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n 7 p 7 


n 8 P8 


n 9 P9 


niopio 


TW 

(43) 



where Ti i i=A r (l — Qio) is the transient matrix, here with a single element. 

The algorithm to perform the n-fold way for escape from the metastable state is: 

• Start the lattice with all iV spins up. Set n\=N and nj=0 for i>l and also set 
M—N. 

• Set the time t to zero. 

• Perform the following steps until the system reaches a magnetization M=0: 

o Calculate three random numbers, f, f, and r. 
o Calculate the 10 values for Qi using Eq. (f42p. 

o Calculate the time m (in units of Monte Carlo spin flip attempts) to exit from 
the current spin configuration (state) using 

ln(f) 



m 



1 . 



(44) 



_ln(l " Qio) 

o Using the random number f, find the spin class k that satisfies 

Qk-i < rQw < Qk ■ (45) 



o Use the random number r to pick one of the nk spins from spin class k, and 
flip the chosen spin. 

o Change the number of spins in class k to n^— 1. 

o If the flipped spin was up (down), change the number of spins in class nfc +5 
(n fc _ 5 ) by +1. 

o If the flipped spin was up (down), for each of the four nearest-neighbor spins 
initially in class j change the number of spins in class j to be rij— 1 and the 
number of spins in class n J+1 (n 3 -_i) to be n^+i+l (n^_i+l). 

o Set the time t to t=t+m. 
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A number of remarks are in order. First, just as in the n-fold way for the simple ion 
channel, the time to exit the current state comes from using Eq. (0) with the Markov 
matrix of Eq. Since there is only one transient state, the equation 



1 - Qio) m < r < (1 - Q 



\ra-l 



10 i 



(46) 



reduces to Eq. (f|4]). Second, the state to which the current spin configuration changes 
comes from using Eq. @ with the Markov matrix of Eq. (|43|). Since there is only one 
transient state, the factors of T m_1 in Eq. ([|) are scalars and cancel, and the normalization 
in Eq. (|9]) becomes v^Tte=Qi . 

To recover the original |$3| continuous-time result, note that Qio<l always, and usu- 
ally Qio<^l for most spin configurations near a minimum of the free energy (i.e. near a 
metastable or stable state). Expanding Eq. (H) gives 



ln(r) 
ln(f) 



1 



ln(l 
1 



Q 



10; 



+ 1 



+ —Q 

24^ 



2 + 

10 + 



19 
720 



Q 



3 4- ■ ■ 

10 + 



+ 1 



Keeping only the first term in Eq. ( |47| ) gives the time increment 

ln(f) 



m 



At 



Q 



10 



(47) 
(48) 

(49) 



with the time units in Monte Carlo steps (mcs). To change the time units to physical 
time units (MCSS) for At, one needs to divide Eq. ([49]) by N. The time unit is trivial 
to change to MCSS in the continuum n-fold way since it just cancels the factor of N in 
the definition for the Qi, Eq. (f42|) , and gives the result in Ref. |33| (where the authors 
define their values of Qi without the 1/N normalization). However, no such fortuitous 
cancelation between m and N during a time unit change occurs in the discrete version of 
the n-fold way, Eq. 



There is an efficient way [33] to keep track of the classes of all the spins that uses four 



arrays. (See Ref. [Q for the use of three arrays and the additional packing of information). 
The array NNCLS(IO) contains the number of spins Tij in each of the 10 classes. The two- 
dimensional array L0C(iV, 10) contains the LOCation of the spins in the lattice for each of 
the 10 classes of spins, where the LxL lattice is considered as a one-dimensional array of 
spins. There is no particular order within a class of spins, and only the NNCLS(i) spins in 
L0C(j ,i) contain spin locations in current use. (Adjustable partitions in a single array 
of length N could be used for L0C |33|] , but in FORTRAN a price is paid in terms of the time 
required to do the bookkeeping involved in changing the adjustable partitions after every 
time step. Here C and C++ programmers have an advantage.) Array LOOK (AO determines 
the index of the first dimension in LOC for each of the N spins, and the array L00C(iV) 
is used to determine the class of a spin in LOC, given its position in the lattice. Hence 
L0C(L00K(i),L00C(i)) = i. 

Suppose we have chosen class k using Qk-i^Q w <Q k and want to flip one of the 
spins in this class. We first calculate a uniform random integer j between 1 and NNCLS(k) 
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and find the spin location i = L0C(j, k). Then we need to update all four arrays for the 
change in classes of the ith spin and its four nearest neighbors. This indexing provides a 
fast method for changing the classes of spins in the n-fold way algorithm |33], [TT|j . 

The CPU time required for the data presented in Fig. ||(a) is presented in Fig. ||(b) and 
the CPU time required for the data presented in Fig. |] are shown in Fig. |7|. Note that the 
n-fold way algorithm is significantly faster at low temperatures (by orders of magnitude!) 
than the standard dynamic Monte Carlo algorithm. For H=—3J the CPU time required 
for the n-fold way simulation is roughly independent of the temperature for any value 
of L. This is because for 2J<\H\<AJ a single overturned spin is the nucleating droplet 
|65|j . For \H\<2J, as in Fig. |6], the CPU time required for the n-fold way simulation 
increases as the temperature is lowered since the nucleating droplet includes more than 
one overturned spin []65|1 . See Fig. [7[ 



5.3 MCAMC for the Ising Model 

Although the n-fold way algorithm is extremely fast for 2J<\H\, it is possible to further 
decrease the amount of CPU time required for a given simulation with \H\<2J by using 
more states in the transient subspace. In particular, at low temperatures where the energy 



of the nucleating droplet is known |65| to be given by Eq. (|37| ), it is possible to use higher 
s MCAMC only for the exit from the state with all spins up and use s=l MCAMC (n-fold 
way) for the other moves. This is similar to the strategy used in the MCAMC for the 
ion-channel model of Sec. 4.3. 

For s=2 MCAMC the transient matrix is 



i _ [P6+4p 2 +(A f -5)pi] 
1 N 



and the recurrent matrix is 




(50) 



H~ = {N -J' )n )- (51) 



JV \ 

The first transient state (in the lower right-hand corner of the matrix) is the initial state 
(all spins up), and the second state is all the N spin configurations with one overturned 
spin. The first absorbing state corresponds to all 2N spin configurations with two nearest- 
neighbor overturned spins, and the second to all N(N—5)/2 spin configurations with two 
overturned spins that are not nearest-neighbor spins. In the MCAMC algorithm, if the 
spin configuration has two or more overturned spins, the normal n-fold way algorithm is 
used. If the system returns to the initial state (all spins up), the absorbing Markov chain 
part of the algorithm using Eqs. ( |50"1) and (|5lD is performed. 

For s=3 the transient matrix for starting from the state with all spins up is 

( N -2p 7 - 6p 2 — (N — 8)pi 2p 7 \ 



1 

N 



4p 2 N-p e -4p 2 -(N - 5)pi p 6 

N Pl N(l - pi) J 



and the recurrent matrix is 

1 /(-V-S)y^ 



(52) 



N 




(JV-5)pi I . (53) 
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Here the transient state added to the s=2 MCAMC is all 2N spin configurations with 
two overturned nearest-neighbor spins. The absorbing states correspond (left to right) 
to all the spin configurations with: 1) two nearest-neighbor overturned spins and one 
not-connected overturned single spin; 2) a linear chain of three overturned spins; 3) an 
L-shaped arrangement with three overturned spins — which corresponds to the nucleating 
droplet at low temperatures for J<\H\<2J which corresponds to i c =2 of Eq. fl38|) ; 4) two 
not-nearest-neighbor overturned spins. 

After exiting from the absorbing Markov Chain portion of the algorithm, the spin 
configuration on exit must be decided on since each of the absorbing states corresponds 
to a large number of spin configurations with equal absorption probabilities. The method 
used in the n-fold way algorithm to decide on which particular spin within a chosen spin 
class is generalized to accomplish this. For example, the first absorbing state for the s=3 
MCAMC uses three random numbers, r^, to construct the exiting spin configuration. First 
T\ is used to pick uniformly one of the N spins and this spin is flipped to down. (With 
periodic boundary conditions this step is not strictly necessary, but is advisable to avoid 



any correlations between the random number and the algorithm |72| — particularly if the 
exit probability is low.) Then r 2 is used to pick uniformly one of the 4 nearest-neighbor 
spins of the flipped spin and this spin is flipped. Finally, r 3 is used to pick uniformly one of 
the N— 8 spins, not including the two flipped spins or their 6 nearest-neighbor spins, and 
this spin is flipped. This gives the exiting spin configuration for this particular absorbing 
state. A similar procedure, using as many random numbers as required, is performed if a 
different state is the absorbing state. 

The time m to exit the absorbing Markov chain is given by using a random number f 
and Eq. (|7|) with m satisfying 

vjT m e<f <vfT m ~ l e. (54) 

Remember that m is in units of Monte Carlo steps (mcs), while the physical time is 
in units of Monte Carlo steps per spin (MCSS). This equation does not simplify to an 
equation in closed form as it did for the s—1 case, Eq. fl34j) . However, it can be solved 
iteratively using for example a bracketing and bisection method ||. It is critical for issues 
of algorithmic speed to obtain a reasonable guess for m and brackets for m for a given 
absorbing Markov chain and a given f. This can be done by assuming a form such as 



Eq. fl4HD with upper and lower bounds |73] for the eigenvalues of the transient matrix. 

The choice of which state the MCAMC algorithm absorbs to is given using a random 
number f and the vector q of Eq. (||). If the elements of q are given by % then the 
absorbing state j is the solution of the equation 

i-i i 

^2qi<f <J2 ( li ( 55 ) 

i=l i=l 



with the first sum being set to zero if j=l. This corresponds to Eq. (f45|) for the n-fold 
way algorithm. For the particular s=2 MCAMC of Eqs. flSOf) and ([51]), the order of the 
calculation of m and the exit state is not important, just as the order is not important 
for the n-fold way algorithm. This is because there is only one transient state that the 
absorbing Markov chain can exit from. In this case the explicit m dependence in Eq. (||) 
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Algorithm 


To 


r-r 


A 


J/kftT of fit point 


Monte Carlo 





UJ-U\H\ 


1.21xl0- 8 


1.231 


n-fold way (s=l) 


8J-2\H\ 


16J-12\H\ 


1.32xl0- ? 


2.1 


s=2 MCAMC 


12J-4\H\ 


12J-10|if| 


1.44xl0- 6 


2.857 



Table 2: The values used to fit the required CPU time for different MCAMC algorithms 
in Fig. |7|. The fit functions are valid for £ c =3, so g J<\H\<J. 



cancels in the numerator and denominator. However, in general this is not the case and 
Eq. (P) and Eq. d55|) must be used to calculate the absorbing state one must exit to. 

If a MCAMC algorithm for fixed s is not performing very well, it can always be im- 
proved upon by adding more states to the transient subspace. A natural question is which 
state(s) should be added first. The more transient states there are, the longer calculations 
using Eq. (0) and Eq. (0) will take — as well as there being a higher probability that a 
programming error is made in constructing either the Markov matrix or the exiting spin 
configuration. So the states should be added to the transient subspace in an intelligent 
fashion. The prescription for finding which absorbing state(s) should be added to the 
transient subspace is found using Eq. (|ll|). Whichever state(s) have the largest probabil- 
ities of absorption should be added to the transient subspace. Of course this will depend 
on the model and the model parameters — for example in metastable decay for the Ising 
model the absorbing state with the highest probability in Eq. ( |TT|) depends on L, \H\, 
and T. 

For very low temperatures, the value of m for each exit can be very large. It is then 
important not to let the discreteness of the random number used to obtain m affect the 
final answer for m. This is important for both low-temperature standard Monte Carlo in 
which the Pi can be very small, as well as for all s-state MCAMC algorithms. In practice, 
I accomplished this by using another random number distributed uniformly between 
and 0.01 whenever the first chosen random number was less than 0.01. The factor of 0.01 
is arbitrary and will depend on the number of bits of the random number generator in 
use. Furthermore, to go to very low temperatures (and hence large lifetimes) I found it 
useful to code using routines that can be set to a large precision. I used the package 



MPFUN [73 1 



Fits to the required CPU times can be obtained at low temperatures by using the 
energy difference between the nucleating droplet and the most probable absorbing state. 
For the data in Fig. ff], these are listed in Table 2. 

Average lifetimes using s=2 and s=3 MCAMC are presented in Figs. |5](a) and || 
The CPU time required for these MCAMC algorithms for | if | =0.75 J is shown in Fig. [FJ. 
For \H\<2J and low temperatures, the higher s MCAMC algorithms can perform many 
orders of magnitude faster than the s—1 MCAMC algorithm, which is itself many orders 
of magnitude faster than the standard dynamic Monte Carlo algorithm! For example, 
at /cbT=0.25J the s=3 MCAMC algorithm is about 10 16 times faster than the standard 
algorithm! 
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5.4 Projective Dynamics for the Ising Model 

The MCAMC algorithm performs extremely well in the low-temperature and strong-field 
regime, i.e. in the single-droplet (SD) and coexistence regimes at low temperatures. How- 
ever, MCAMC with small s does not perform very well at higher temperatures or for 
large system sizes. This is because the average time increment m given by Eq. fl54|) is 
relatively small for large T or large L. In principle it would be possible to construct higher 
s MCAMC that would work in this regime, but this would require a lot of bookkeeping 
of the large number of transient and absorbing states. Is there another advanced simu- 
lation method that can be used in the multi-droplet regime? For the simple ion-channel 
model of Sec. 4.4, the solution was found by using the growing and shrinking probabilities. 
However, for an iV-spin Ising model the underlying Markov matrix is 2^ x 2 N with compli- 
cated transition probabilities between the 2 N states. In continuous time the probability 
P({a},t) of the iV Ising spins being in state {a} at time t is given by 

dP( t ht) = Er(WIK})P(W,t) , (56) 

m W'} 

where rdcrjllo"'}) is the probability per unit time of moving to spin configuration {o~}, 
given that the current spin configuration is {er'}. 

The idea behind the Projective Dynamics (PD) method is that one still expects that 
a simple one-dimensional physical picture of the free energy, Fig. |8|, might hold in this 
complicated situation. In particular, one considers lumping together all states with a total 
magnetization M to form a (iV+1) x (iV+1) lumped Markov matrix. Mathematically, the 
Markov matrix with 2 N states is not even weakly lumpable J75|, but is most likely almost 



weakly lumpable with respect to the states along the escape from metastability. Thus for 
the Master equation on the N+l states of a particular magnetization we get 

dP{ ^/ ] = E W(M\M')P(M',t) . (57) 

01 {M'} 

Note that due to the single-spin flip dynamic there will only be non-zero transition proba- 
bilities between states that have magnetizations that differ by one overturned spin. Then 
the question is how to define the lumped transition probabilities W{M\M'). One possi- 
bility is to assume an Arrhenius form |76| for the transition probabilities for the lumped 



states [^, [77| . This will give the correct free energy for the static model, and actually 



gives a reasonable approximation for (r) in the single-droplet regime [^, 77] . It has been 
extended to projection onto two dimensions, magnetization and energy |7_8J [79]. How- 
ever, in the multidroplet regime it is important to also get the growth phase correctly, 
and this is a non-equilibrium phenomenon. This leads to poor agreement between the 
average lifetime obtained from standard simulations and by using the Arrhenius form for 



the shrinking and growing probabilities on the lumped states |66| . 

However, it is possible to actually measure the lumped transition probabilities during 
a sequence of K simulated escapes from the initial all-spin-up configuration. The lumped 
matrix is tridiagonal due to the single-spin-flip character of the dynamic. Define (cj)m 
to be the average number of spins in class i during the simulated escape, given that the 
configuration has magnetization M. In terms of the flipping probability of a spin in class 
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i, the growing rate of the stable phase (spins down) is 



5 

g(M)=J2(ci)MPi, (58) 
i=i 

and the shrinking rate of the stable phase is 

10 

s(M) = J2( c i)MPi ■ (59) 

Note that the (ci)m should be measured during the actual escape from the initial spin 
configuration. Then the 'average' transition probability for the lumped matrix is 

W(M\M') = s(M')5 M+2>M , + g(M')5 M ^ M ' • (60) 

In particular, the probability of going from lumped state M to lumped state M' is given 
by 

^( M I M ') = ^U £ E r(WIM), (6i) 

where n(M) is the number of states with magnetization M. 

The random walk starts at M=L 2 and terminates at M—0. We define h(M) as the 
total time spent in M. If L is even, h(M)=0 for M odd, since these magnetizations are 
not possible. Then, in analogy with Eqs. ( |3"2j ) and (|33|), one has 

An error estimate for (r) can be obtained from error estimates of the h(M). Assume 
error estimates for g(M) and s(M) have been obtained, for example by using a jack-knife 
procedure p0|. Then using standard error propagation |81|, the error in the lifetime is 



L 



2 



(5rf = £ [5h(M)f (63) 

M=2 



with 



and iteratively 

Sh(M) = -±^{[l- s (M-2)h(M-2)] 2 (5g(M)) 2 
+ [h(M - 2)] 2 (Ss(M - 2)) 2 

+ [s(M-2)] 2 (5/i(M-2)) 2 }^ . (65) 

How close to the actual lifetime is the average lifetime measured by using Eq. (|62|) ? 
The answer is that they are identical to within statistical errors! This is true even though 
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the original Markov matrix on 2 N states is not lumpable. For the Ising model this can 
be proven using class populations ||82|| . However, there is a more transparent proof [|S3[] . 
Assume that all the g(M) and s(M) were correct. Since g(2) is correct, from Eq. fl62|) 
h(2) is correct. Since h(2), s(2) and (4) are correct, from Eq. fl62[) /i(4) is also correct. 
This can then be iterated to obtain the exact value of (r) . Note that although the exact 
value of (r) is obtained, only approximations to the higher moments of r are obtained. 
The lumped growing and shrinking probabilities also allow one to numerically inves- 



tigate the 'free-energy' landscape during the escape from the metastable state [S3, |S5|j . 
Figure [5] shows these growing and shrinking probabilities for a chosen set of parameters 
for the square-lattice Ising model. Wherever g(M)=s(M), the probability of growing 
is equal to the probability of shrinking, and that value of M must correspond to an 
extremum during the motion. In terms of the coarse-grained 'free energy' during this es- 
cape, these correspond to the metastable state, the saddle point, and the stable state. In 
more complicated models the locations where g(M)=s(M) also represent extrema. This 
fact allowed |35] both the Barkhausen volumes and activation volumes during thermally 
assisted domain-wall motion in a model for magnetization reversal in Fe sesquilayers on 



W(110) to be obtained [86|. Using Eq. fl62|) to obtain h(M) and assuming that just as in 



equilibrium the 'free energy' during the escape from the metastable state is 

T{M) oc — In [h{M)\ (66) 

gives a way to measure the 'constrained free energy during escape from the metastable 
state,' Fig. [U], similar to Fig. ||. 

It is also possible to introduce a moving wall in the magnetization to force the system 
out of the metastable state, and to measure g(M) and s(M) during this (hopefully quasi- 
static) process |37[. The wall position is given by 



M wall (t) = (L 2 + l)-t; wall t. (67) 
For a soft wall the normal Monte Carlo flip probability is multiplied by 

Pwaii = exp [-c (M ncw - M waU )] (68) 

if the wall at position M wa u is past the magnetization of the Monte Carlo new trial 



configuration |pq|. Otherwise there is no change in the flip probability. Another possibility 



is to introduce a hard wall, c=oo, that always flips a chosen up spin if M wa n is past the 



current magnetization. Fig. [XT] shows an example of lifetimes for a hard forcing wall as a 
function of the wall speed. The difference between the lifetimes measured from Eq. ( |B"2"D 
with the number of Monte Carlo Steps per spin before escape with the wall gives an 
estimate for the speed-up due to incorporating the forcing wall. Of course, for a wall 
that moves too fast the process is not quasi-static, and the lifetimes are not accurate. 
However, good results are obtained even for relatively fast walls. For example, for the 
hard wall of Fig. [TT] a velocity of 3xlO _6 M/L 2 MCSS gives a lifetime comparable to an 



actual escape, but requires almost five orders of magnitude less computer time. The soft 
walls do not seem to help the convergence, and additional computations are needed at 
each step to calculate p wa ii. This often makes the hard wall more computationally efficient 
than the soft walls |88 |. For example, for the same parameters as in Fig. |ll| with c=0.01 
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with a speed of 1.0xlO~ 5 M/L 2 MCSS the lifetime was 2.6xl0 9 MCSS, had not yet become 
independent of the forcing speed, and required about 2.1 minutes of Cray SV1 CPU time 



for each escape. This should be compared with the same forcing speed in Fig. [TT] for the 
hard wall, which required only about 0.06 Cray SV1 CPU minutes for each escape. 

In some cases the class populations (cj) during escape from the metastable state can 



be approximated by the class populations in equilibrium, (cj) cqu ||84|| . These equilibrium 
class populations can be measured during an equilibrium Monte Carlo simulation 
Then using Eq. fl6"2|) with this assumption the lifetime can be obtained at all tempera- 
tures and fields since the dependence of the flip probabilities on T and H are known. The 
approximation for the (q) actually becomes better as \H\ becomes smaller since then the 
nonequilibrium configurations play less of a role in the metastable decay. In this way life- 
times of about 10 30 Monte Carlo Steps per spin have been obtained for Ising ferromagnets 



in two and three dimensions for small H §4jj. In addition, with the assumption that the 



class populations do not change with a change of dynamics, values of (q) obtained using 
one dynamic [say Eq. (fj)] can be used to obtain lifetimes using a different dynamic [say 
Eq.®]. 



As seen in Eqs. ([58]) and (59), the growing and shrinking rates are functions of both the 
average of the spin classes, (cj), and the single-spin flip probabilities, Pi. The functional 
form for pi is given by Eq. (|2]) using the energies in Table 1. Hence the functional 
form for the pi are known for all T and H values. One could assume that the average 
class populations do not change too much with changing field or temperature. Then the 
class populations at one simulated value of T and H can be used to obtain growing and 



shrinking probabilities at other values of H and T. For the parameters in Fig. |12| this 
assumption produces the wrong functional form for (r). This is because for a growing 
interface the class populations are not constant ||89|| . An alternative assumption is that 
the average class populations can be given by the simulation results at k-sT=J except for 
magnetizations near the metastable well, where class populations from equilibrium Monte 



Carlo simulations |>2, [84| can be used. As seen in Fig. [12] this combination of assumptions 
works well for the chosen parameters over almost the entire temperature range. This 
is because at low temperatures the growth phase contributes very little to the average 
lifetime. Furthermore, for our fixed L and H, the most probable equilibrium configuration 
is nearly the same as the most probable path of the nucleating droplet at fixed M as T 
becomes small. Such techniques must be used in conjunction with knowledge about the 
physics of the escape from the metastable state to be accurate, as Fig. [12] demonstrates. 



Simple approximate class populations during the growth phase |89[] could alternatively be 
used for the growth portion of these types of calculations. 

It is also possible to approximate the growing and shrinking probabilities of a system 
of size 2N from a system of size iV ||84|| . Since the relevant configurations typically 



contain many small droplets, to a good approximation the larger system can be viewed as 
consisting of two 'independent' copies of the smaller system. In this approximation one 
can write that 

^ Y$, =M h(N, N + M- M')h(N, M') [g(N, N + M - M') + g(N, M' - N)] 
1 ' '~ Ew= M h(N,N + M-M')h{N,M'-N) 

(69) 
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and 



s(2N, M) 



2^M> 



zM h(N, N + M - M')h(N, M') [s(N, N + M - M') + s(N, M' - N)] 



Zm>=m KN, N + M- M')h{N, M> - N) 



(70) 

where the explicit dependence on the volume has been included. Remember that for iV 
even, the odd magnetizations can be included since they have zero residence times. Note 
that the cut-off in this extrapolation does not remain at the original value of M, but rather 
is a constant in the number of overturned spins on a lattice of any size. Using Eqs. ( |69"D 
and ( |70"D repeatedly, one can extrapolate to very large lattices. While the lattice sizes 
extrapolated to are nowhere near world record sizes |Kj, much additional information has 



been obtained from the small-lattice simulations. This is illustrated in Fig. |13 . 

These types of projective dynamics techniques with a forcing wall can also be used on 
other discrete spin models and on other lattices to obtain extremely long lifetimes pi]. For 



example, Figure [14] shows results for the simple-cubic Ising ferromagnet on a 32 3 lattice. 
The temperature and field are such that (r)=7.3xl0 15 MCSS. This lifetime is more than 
10 6 times longer than a recent large-scale simulation of the model using standard Monte 



Carlo simulations Here the inverse speed of the hard forcing wall was 5000 n-fold 
way updates per spin per overturned spin. For the square lattice at low temperatures 
at fixed 
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L, the nucleating droplet has neither a square nor a circular shape 
Similarly, for the simple cubic lattice the nucleating droplet at low temperatures for fixed 
L has a particular shape which is neither a cube nor a sphere [95], |96 |. These discrete 



lattice effects cause the oscillations in the quantities seen in Fig. [14 



5.5 Parallelization of Dynamic Monte Carlo 



With the widespread availability of massively parallel computers, with the number of 
processing elements (PEs) on the order of hundreds to thousands, the question arises as 
to how dynamic Monte Carlo simulations can be implemented efficiently on these parallel 
computers. Often the parallelization can be accomplished in a trivial fashion, just by 
letting each PE run its own copy of one of the K escapes. Then no communication 
between PEs must be performed until the escape is finished and statistics are collected. 
However, sometimes the lattice size to be simulated is too large to fit onto a single PE, 
or fitting the simulation on a PE will use too much swapping of virtual memory, leading 
to long execution times. In such cases a non-trivial parallelization scheme is needed. 

The initial impression might be that dynamic Monte Carlo is a scalar algorithm. 
However, in 1988 Lubachevsky |97] showed that the dynamic Monte Carlo simulation for 
the Ising model belongs to a class of problems called discrete-event simulations and can 
be parallelized. As mentioned in Sec. 4.2, a discrete-event simulation [|54|] is a problem 
in which the state vector changes discontinuously at certain (perhaps random) times. 
Discrete-event simulations are used in an extremely wide range of applications in science, 
technology, manufacturing, and sociology. Just a few recent examples include preventative 
veterinary medicine |[98|| , resource allocation following a terrorist bombing p9| , modeling 
concrete supply and delivery ||100|| , aircraft design [101], ecological models ||102| , |103 



design of logical circuits ||104| , |105|| , manufacturing systems [ 106 , battlefield simulations 



|[L07|| ) and simulation of wireless services [|108|[ . For the Ising model the method of paral 
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lelization is to place a block of spins, say £x£ in two dimensions, on a processing element. 
Then the total number of spins on the lattice is N=L 2 =£ 2 Npe where iVpE is the number 
of PEs used in the simulation. A block of spins and its neighboring spins from nearest- 
neighbor blocks on a square lattice is shown in Fig. 15], shown for £=6. For £=6 there are 
16 interior spins and 20 boundary spins on each PE. 

To parallelize the dynamic Monte Carlo algorithm, each PE has its own time clock 
with its own simulation time. This local time is called the 'virtual time' of the block 
of spins IP- 0EH| . Remember that the entire system has a single time — but for parallel 
implementation each PE has its own clock with its own virtual time. To calculate some 
quantity at a particular time, for example the spin configuration at time t, the simulation 
must evolve until the clocks on all the PEs are at time t. However, if one is interested 
only in advancing the spin configuration over time (as opposed to measuring the spin 
configuration at a particular time) each PE can independently advance the spins in its 
block and update its time. 

Of course the blocks of spins on nearest-neighbor PEs are not independent. Conse- 
quently the parallelization must ensure that causality is not violated as the simulation 
progresses. The conservative approach [97] to avoid causality violations is to have a PE 
idle (wait) until it is guaranteed that an update will not violate causality. If the time on a 
PE's clock is behind that of all of its nearest-neighbor PEs, then there is no possibility that 
causality is violated, and the PE can advance its spin configuration and time. Another 
approach to avoiding causality violations, called the speculative or optimistic approach, 
is to assume that no causality violations occur and have every PE always advancing its 
spin configuration ||109| , |110| . Of course at some point a causality violation will occur for 
a particular PE. Then it must roll back to a previous time and spin configuration before 
causality was violated. This rollback mechanism can then cause causality violations and 
hence rollbacks on the nearest-neighbor PEs. Hence such rollbacks can cascade [11-114] 
over the lattice. 

We will only describe parallelization of the dynamic Monte Carlo for the Ising model 
using the conservative approach p7| . The same approach can easily be applied to other 
dynamic models, including dynamic classical models with continuous degrees of freedom. 
Each PE advances both its clock and its £x£ block of spins if and only if its virtual time 
is not greater than the virtual times of all its nearest-neighbor blocks. Otherwise the PE 
idles until its virtual time is not greater than those of its nearest-neighbor PEs. In other 
words, if the virtual time for a PE is not a local minimum of the 'virtual time' surface 
then it executes a wait until directive until its virtual time is a local minimum. The 
'virtual time' surface is the surface of 'virtual times' on all the PEs. This algorithm is free 
of deadlocks, because at worst the PE with the global minimum virtual time can advance 
its virtual time. 

The checking restriction of virtual times can be relaxed for £>1 so that once a PE 
chooses a boundary spin to try to flip it checks the virtual time only of the single PE (or 
the two PEs for a corner spin) that the chosen boundary spin has its nearest-neighbor 
spin on. If this neighboring PE's virtual time is less than the updating PE's virtual time, 
then the updating PE waits until its virtual time is less than this neighboring PE's 
virtual time before the update attempt is made. 

Figure [16] shows the update rate in Monte Carlo steps (mcs) per second on a Cray T3E 
as a function of the number of processing elements (Npe) used. This is based on work 
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reported in Refs. [|115| , |1 16[] . The standard dynamic Monte Carlo algorithm of Sec. 5.1 was 
run using iVpE values of 8, 16, 64, 128, and 400, a fixed block size with £=128, T=0.7T C , 
and |if |=0.2857J. The size of the simulated lattice was L- 
simulated had L- 



=£y/N PE , so the largest lattice 
Figure [16] shows that the algorithm seems to be scalable — in 



=2560. 

other words, the update rate increases linearly with iVpE as the number of PEs and the 
lattice size L increases for a fixed T, H, and £. 

One important question in parallel discrete-event simulations using the conservative 
approach is the utilization of the PEs. In other words, what fraction of the PEs are not 
idling at a particular time? This can be answered by using non-equilibrium surface science 
methodologies ||117| , |118|| to study the average utilization, (u), of the parallel algorithm. 
The value of (u) for the standard dynamic Monte Carlo simulations of Fig. [31| are shown 



in Fig. [17 



For the worst-case scenario for the average utilization, £=1 (one spin on each block), 
it is possible to study the average utilization using finite-size scaling of simulations of the 
virtual time to obtain information about (u) as iVpE— >-oo [ |1 1 7| , |119|| . Fig. [IT] shows the 



result obtained for a square lattice ||119|1 from finite-size scaling of simulation data. Much 
more information about the virtual time surface can be obtained for a one-dimensional 
lattice (with periodic boundary conditions). In Ref. ||117|| it was shown that the virtual 
time surface for the conservative implementation of parallel discrete-event simulations 
is governed by the Edwards- Wilkinson Hamiltonian ||120| , |121|| . Universality arguments 
then prove that (u) must be bounded away from zero as iVpE— >oo. This proves that in 
one dimension the conservative implementation of parallel discrete-event simulations are 
scalable in the worst-case scenario! This is a very general result for all one-dimensional 
lattices, regardless of the actual physical problem being simulated. In one dimension 
precise finite-size scaling can be used to show that (u) is just slightly less than | in the 
worst-case scenario. This value is also shown in Fig. [I7|. The fact that (u) is finite, 
and hence the algorithm is scalable, is a non-trivial result. It has been shown, including 
exact analytic results, that there are other surfaces from reasonable growth models where 
the density of local minima approaches zero 
approach to the iVpE— >oo limit in one dimension shows that 



122]. A finite-size scaling analysis of the 
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(71) 



for the £=1 case ||117 



nonequilibrium surface 
density of local minima is 



This is the scaling form for the density of minima for this type of 
1231 



In addition it was shown |117| that the fluctuations in the 
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(72) 



PE 



for the one dimensional £=1 case 11117 



Another intrinsic property of the virtual time surface is its width as iVpE-^oo. This is 
important because, in order to measure a spin configuration at a particular time t, all PEs 
must wait (idle) once they reach time t until all the PEs have caught up. If the virtual 
time surface has a width which diverges as iVpE— >oo, then this measurement process will 
not scale. In Ref. [[L17|] it was shown that indeed the interface width diverges in one 
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dimension since the virtual time horizon belongs to the Edwards- Wilkinson universality 
class. In higher dimensions numerical studies also indicate a divergence of the width of 



the virtual time surface ||119|| . It has, however, been shown that a slight change in the 
algorithm ||118|| can produce both a scalable algorithm and a virtual time surface with a 
finite surface width. 

Another way to handle measurements is with a buffer of spin configurations. These 
spin configurations are stored once the virtual time on a PE is equal to a particular time 
at which a measurement is to be made. As long as the buffer is finite and the virtual time 
surface width diverges as iVpE— >oo, this measurement procedure will also not scale. 

The inner loop of the parallel dynamic Monte Carlo algorithm is executed on each PE 
in parallel. Each PE updates an £x£ block of spins, and each PE has its own clock to keep 
its virtual time. The parallelization is much easier in continuous time since in this case 
no synchronization is required because the virtual times on neighboring PEs cannot be 
equal. The simulation is started with all spins up and each PE has its virtual time set to 
zero. Then each PE determines the time At=— ln(f), where f is a uniformly distributed 
random number, of its first update attempt. The inner loop (written here specifically for 
a square lattice) is then executed repeatedly: 

• Select a spin from the £ 2 spins on the block with equal probability. 

• If the chosen spin is: 

o One of the [I— 2) 2 spins in the kernel of the block proceed to the next step. 

o One of the 4£— 4 spins on the boundary, then check the virtual time of the 
neighboring PE or PEs that the chosen spin interacts with. If at least one of 
the neighboring PEs has a virtual time less than the updating PE's virtual 
time of the next update, then wait until this is no longer the case. (For the 
square lattice, this checking and wait-until condition requires 2 PEs for corner 
spins and one PE otherwise.) 

• Update the spin using the same probabilities as in the standard dynamic Monte 
Carlo serial algorithm, i.e. with Eq. (|2|) or Eq. (^). 

• Add At to the local virtual time. 

• Determine the virtual time of the new next update by using the local time increment 
At=— ln(f), where f is a uniformly distributed random number. 



The performance of this algorithm is shown in Fig. 16, and its utilization is shown in 



Fig. 17 



Can the advanced dynamic Monte Carlo algorithms also be parallelized? Lubachevsky 
in 1988 |)7| proposed a method to parallelize in a conservative fashion the n-fold way 
algorithm. This was implemented and tested for the first time in 1999 [ 115 , |116|1 . The 



idea is to have each PE using the n-fold algorithm with equations similar to Eq. 
in discrete time or Eq. ([49]) in continuous time to update its virtual time. However, to 
ensure causality is not violated a 'shield' of spins at each surface acts as an absorbing 
state. In each block an additional absorbing class is defined which contains the spins on 
the boundary. There are N^=4(£— 1) of these spins. The original n-fold way tabulation 
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of the 10 spin classes for the square lattice are only performed for the N^=(£— 2) 2 spins 
in the kernel. Hence N^=J2i=i n i an d N^+N^—l 2 . Note that each PE keeps its own class 
populations and its own virtual time. For each escape the simulation is started with all 
spins up and each PE has its virtual time set to zero. Then each PE determines the time 
At from Eq. (j73|) of its first spin update. The inner loop of the continuous time algorithm, 
performed by every PE in parallel, is: 

• Use a uniformly distributed random number r to select a class according to the 
relative weights {Ny,, niPi, n 2 p2, • • ■ , n w p w }, then use another random number r\ to 
select a spin from the chosen class with equal probabilities. 

• If the spin is: 

o One of the [I— 2) 2 spins in the kernel, then flip this spin (with probability one 
— no rejection) and proceed to the next step. 

o One of the A£— 4 spins on the boundary, then wait until the local simulated 
virtual time of the next update is not greater than the virtual time on the 
neighboring PE(s). This spin may or may not flip: flip it using the flip prob- 
ability of Eq. (Q) or Eq. (^) and proceed to the next step. 

• If the chosen spin was flipped, update the 10 spin classes n,. 

• Add At to the local virtual time. 

• Use a random number f to determine the time of the new next update (in units of 
Monte Carlo steps — mcs) by using the local time increment 



ln(f) 

N h + EE n iPi 



^ = — (73) 



The performance of this algorithm with fixed £ as a function of iVpE is shown in Fig. [LJ| For 
the parameters of the figure, the parallel n-fold way algorithm is almost twice as fast as the 
parallel standard dynamic Monte Carlo algorithm. For the parallel n-fold way algorithm 
it was found that the average utilization (u) was about ~ (Fig. [17]). This utilization 
is lower than the utilization of the parallel standard Monte Carlo simulation (Fig. [17]). 
Nevertheless, as seen in Fig. ^ the n-fold way performs better in the number of spin flip 
trials per second per PE than does the parallel standard Monte Carlo algorithm. The 
introduction of the shield spins as an absorbing class limits the performance of the parallel 



n-fold way algorithm at low temperatures [ |115|| . This is because at low temperatures the 



flip probabilities for each class, Pi, are small and the absorption almost always will take 
place in the class with the shield spins. In this case the average time step is limited 
by the block size, and is approximately (At)^£/4 ||115 . 



6.0 Summary and Discussion 

Several advanced dynamic Monte Carlo algorithms have been introduced. All these algo- 
rithms preserve the underlying dynamic of the model — they just implement it in a more 
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intelligent fashion on the computer. One algorithm is the Monte Carlo with Absorb- 
ing Markov Chains (MCAMC) algorithm. The lowest-order, s—1, MCAMC algorithm 
corresponds to a discrete-time version of the n-fold way algorithm [33. 34|. The other 
algorithm is the projective dynamics algorithm. We have shown that these advanced al- 
gorithms can be applied to models to study escape from metastable states. In all cases, 
the advanced algorithms do not alter the underlying dynamic of the model, but only 
implement the underlying dynamic in a different fashion. These algorithms can be many 
orders of magnitude faster than the standard Monte Carlo simulations. Consequently, 
they allow simulations to be performed in parameter regimes it is impossible to study 
with standard algorithms. 

The MCAMC method was applied to a simple lattice model for motion of a random 
walker through a one- dimensional energy landscape in Sec. 4.0. This is a preliminary 
problem to using such methods to simulate motion of a particle through a free-energy 
landscape caused by a pore, such as ionic motion through a biological ion channel. It was 
shown that the n-fold way can be much faster than standard Monte Carlo. The n-fold way 
algorithm, sometimes called event-driven simulation or a rejection-free algorithm, corre- 
sponds to s—1 MCAMC. In this model the higher s MCAMC algorithms were introduced 
and found to sometimes be orders of magnitude faster than the n-fold simulations. For 
this simple model, the projective dynamics algorithm was found to be solvable for a finite 
lattice. For more realistic simulations of motion through pores this will no longer be the 
case, but the projective dynamics algorithm is expected to again be applicable. 

Metastable decay of the Ising ferromagnet was illustrated using the MCAMC and 
projective dynamics algorithms. Again, the higher s MCAMC algorithms were shown 
to often be many, many orders of magnitude faster than lower s MCAMC algorithms or 
standard Monte Carlo. The disadvantage of the MCAMC algorithms is that they require a 
substantial amount of bookkeeping. On the other hand, the projective dynamics algorithm 
can also be many orders of magnitude faster than conventional simulations, and it does not 
require very much bookkeeping. In addition, the projective dynamics method can provide 
the magnetization of the metastable and stable states as well as the intervening saddle 
point. However, if higher moments of the distribution of escape times are desired, the 
projective dynamics algorithm only provides approximations. The MCAMC algorithms 
were found to work best at low temperatures in reasonably strong fields: near the single- 
droplet regime of the 'metastability phase diagram'. The projective dynamics algorithm 
was shown to also work in the multi-droplet regime. 

Although both the MCAMC and projective dynamics algorithms were initially de- 
signed for dynamic Monte Carlo simulations of models with discrete states, they have 
recently started to be generalized to spin models with continuous degrees of freedom. 
Here although there is an underlying spin dynamic ||124|| which is used for example in mi- 
cromagnetic calculations [ 125 , 126 |, in some cases it is still possible to ascribe a physical 
meaning to a Monte Carlo move [127]. For the classical Heisenberg model recent ex- 
ploratory efforts to apply the n-fold way |129| , |130|1 and the projective dynamics methods 
[|128|[ have been encouraging. 

The advanced dynamic Monte Carlo algorithms presented here have the goal of accel- 
erating the simulation of dynamics for physical systems. Methods with the same goal for 
molecular dynamics simulations have also recently been advanced by Voter. These include 
the hyperdynamics [|131|| , parallel replica dynamics ||132j| , and temperature-extrapolated 
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dynamics ||133|| . These three methods can be used either individually or in combination for 
molecular dynamics simulations. They all assume that transition-state theory ||134| , |135 | 
is satisfied. This assumption is related to the assumption in the projected dynamics that 
the ratio of growing and shrinking probabilities is given by Boltzmann weights. As shown 
|66|1 for the Ising model, such an assumption works reasonably well in the single-droplet 



in 



regime but fails in the multidroplet regime. Furthermore, in Ref. [|66| the prefactor of the 
nucleation rate was shown to depend on the explicit dynamic used, even though the dy- 
namics all satisfied assumptions similar to transition-state theory. It can be anticipated 
that only in the regimes where the growth phase is unimportant will the transition-state 
theory assumptions be satisfied, and only there will Voter's advanced molecular dynam- 
ics algorithms give correct results. Conversely, neither the MCAMC nor the projective 
dynamics methods presented here depend on any assumptions related to transition-state 
theory. They are just different ways to implement the original dynamic. Hence these 
advanced dynamic Monte Carlo algorithms give the correct result. 

Another recent dynamic Monte Carlo algorithm called Transition Matrix Monte Carlo 



(TMMC) has been proposed by Wang, Tay, and Swendsen ||136|| . The idea is very similar 



to that of projective dynamics |84, 37 1. The underlying 2 N x2 N Markov matrix of the 



N spin Ising model is lumped using equations identical to Eq. (|57D and (^l|) but with 
energy rather than magnetization as the lumping variable. Wang et al. study the square- 
lattice Ising model at T c in zero field. The lumped Markov matrix they obtain is not 
tridiagonal like the projective dynamics Markov matrix in Sec. 5.4, so our proof that the 
dynamic is unaffected by the lumping is no longer applicable. In fact, they obtain dynamic 
relaxation times different from those of the unlumped Markov matrix. Consequently, the 
TMMC method must alter the underlying dynamics of the model. It would be interesting 
to project onto both slow variables, M and energy, for the Ising model |78|, [7^] . This 
can be done p6[ using the formal relationship between the projective dynamics and the 



Nakjima-Zwanzig ||137| , |138|| projection-operator formalism for the master equation, which 
is equivalent to Mori's [ 139|| projection-operator formalism for the equations of motion of 
observables ||140| , |141|| . Projecting onto both slow variables should make the underlying 



Markov matrix closer to weakly lumpable. 

In summary, the projective dynamics and MCAMC algorithms allow dynamic Monte 
Carlo simulations to be performed cheaper, better, faster — without changing the underly- 
ing dynamic. They should be used in all situations where long time dynamic Monte Carlo 
simulations are required to understand the underlying physics of the discrete-state-space 
model under investigation. 
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Figure 1: The dimensionless energy of each of the sites of a simple model for motion of 
a particle through a pore. 
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Figure 2: The average lifetime, in Monte Carlo steps (mcs), as a function of inverse 
temperature for particle motion in the energy arrangement of Fig. |l|. See Fig. ^ for an 
explanation of the plotting symbols. The solid line is the exact result using Eqs. (P0), 
and (1). 
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1 / Temperature 1 / Temperature 



Figure 3: The amount of Cray CPU time (in minutes) required to perform K=1000 
escapes as a function of inverse temperature. All simulations obtain the same average 
lifetime (within statistical errors) as shown in Fig. The symbols correspond to standard 
Monte Carlo (o), rejection free or n-fold way or event driven or s=l MCAMC (x), and 
s=2 MCAMC for the lattice pairs (5,6) and (8,9) in Fig. and s=l MCAMC otherwise 
(o). (a) The entire temperature range simulated, with solid lines joining the data points 
and the dashed lines extrapolated timings assuming that at low temperatures the CPU 
time required is an exponential in the inverse temperature, (b) The high-temperature 
portion of the same figure, showing where the algorithms cross over in speed. 
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Figure 4: The cross-over 'phase diagram for metastable decay' in the square-lattice Ising 
ferromagnet is shown. The strong-field (SF) regime, the multi-droplet (MD) regime, and 
the single-droplet (SD) regime are shown. The heavy dashed line separates the SF and 
MD regimes and is estimated using R c =a/2. The cross-overs between the SD and MD 
regimes depend on the system size and are shown for L—10, 20, 100, and 200. The solid 
lines starting at \H\—4J and T=0 are the single overturned spin cross-over between the 
SF and SD regimes, as described in the text. The vertical solid line shows the location of 
the critical temperature, T c . 
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Figure 5: For H=—3J with L=20, L=76, and L=240 data are shown for different 
dynamic Monte Carlo algorithms, (a) L 2 times the average lifetime, L 2 (t), for the square- 
lattice Ising ferromagnet is shown as a function of inverse temperature, note the units 
of L 2 times MCSS which is Monte Carlo steps (mcs). The vertical dashed line is the 
critical temperature. The dashed line corresponds to the low-temperature prediction 
(r) =exp(2 J /k^T) |65| , while the solid line corresponds to (r) = (5/4)exp(2J//cB^), which 
includes the exact low-temperature prefactor |)7|. Note that all algorithms produce the 
same lifetimes within statistical errors. In (b) the CRAY CPU time in minutes per escape 
is shown. Note that for the n-fold way algorithm (rejection free, s=l MCAMC) the CPU 
time per escape is almost flat. This is because for 2J<\H\<4J the nucleating droplet is 
a single overturned spin ||65|| . The s=2 MCAMC algorithm, shown only for L=76, is also 
approximately flat. Conversely, the standard dynamic Monte Carlo algorithm requires a 
CPU time that grows exponentially with inverse temperature, shown as the heavy solid 
line. 
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Figure 6: The average lifetime, (r), for the square-lattice Ising ferromagnet is shown 
as a function of inverse temperature for H=—0.75J for different L. The same data are 
plotted in (a) and (b) in different time units. The vertical dashed line is the critical 
temperature. In (a) the lifetime (r) is shown in units of the physical time, Monte Carlo 
Steps per Spin (MCSS). The dashed line corresponds to (r)=0.563 exp(16 J/3T) while 
the light solid lines connect the data points. In (b) the lifetime (r) is shown in units of 
MCSS times L 2 (or Monte Carlo steps — mcs). The heavy dashed line corresponds to 
(r)=0.186L~ 2 exp(13.5J/T) and the light solid lines connect the data points. In (a) the 
data points lie on top of each other in the SF and MD regime since the lifetime there is 
independent of L. In (b) the data points in the SD regime lie on top of each other since 
the lifetime there is inversely proportional to the volume. 



44 




Figure 7: The CPU time required for the simulation of the 20x20 Ising ferromagnets 
presented in Fig. ||is shown as a function of inverse temperature. The symbols are for stan- 
dard dynamic Monte Carlo (x), n-fold way simulations (□), s=2 MCAMC (o), and s=3 
MCAMC (•). (a) Uses a fit to A^exp [{T — T Q )/kBT] to estimate the low-temperature 
CPU time required for the slower algorithms. This fit is given by the corresponding heavy 
lines. The fit parameters are listed in Table 2. (b) Shows the region where simulations 
were actually performed. This shows the cross-overs between the timings of the various 
algorithms. 
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Figure 8: A schematic of the free energy per spin as a function of the total magnetization 
M for an Ising model. Shown are the metastable well, the stable well, and the barrier 
between the two wells. 
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Figure 9: The growing and shrinking probabilities for escape of the square-lattice Ising 
model are shown as functions of the total magnetization. Wherever the shrinking and 
growing probabilities are equal, the system is at an extremum of the free energy. These 
correspond from right to left in (a) to the metastable phase, the saddle point, and the 
stable phase, (b) Shows the portion near the metastable phase and saddle point, (c) Shows 
the portion near the equilibrium phase. The parameters are L=20, T=0.8T C «1.815J, 
H=— 0.15 J. The simulation is started with M=L 2 and is run until the first passage to 
M=-L 2 . 
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Figure 10: The negative of the natural logarithm of the residence time, h(M), as a 
function of the system magnetization M. The parameters are the same as in Fig. 
The integral of the residence times gives the average lifetime, which for these parameters 
is 1.1 xlO 3 MCSS. This should be compared with Fig. ||. Note that the stable phase 
is less probable here than in Fig. || since the simulation is stopped the first time the 
magnetization reaches M=—L 2 . 
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Figure 11: The average lifetime (r) measured from the growing and shrinking probabil- 
ities as a function of the forcing speed is shown by the filled symbols for a hard forcing 
wall (•). The corresponding open symbols (o) are the measured number of Monte Carlo 
steps per spin (MCSS) to escape from the metastable well with the forcing present. The 
lines connect the data points. The heavy dashed line corresponds to the result for (r) 
from standard simulations. These results are for L=20, /cb7"=0.5J, and \H\=0.75J. Note 
that once (r) is approximately independent of the forcing speed, there is a difference of 
about 10 5 between the results measured by the growing and shrinking probabilities and 
the actual escape time with the forcing wall. This 10 5 difference reflects the efficiency of 
the projective dynamics with a forcing wall for these parameters. 



47 




Figure 12: The average lifetime, (r), versus inverse temperature for L=20 and 
H=—0.75J. The s=3 MCAMC results (V) are shown for comparison. The solid dashed 
curve is from a simulation at k^T=J with the assumption that the spin classes are un- 
changed as T changes. It has the wrong functional form. The bullets (•) use this assump- 
tion, except the first 8 overturned spins use class populations obtained from equilibrium 
Monte Carlo simulations at fixed magnetization. The lines connect the data points. 
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Figure 13: The lifetimes obtained from extrapolating in L using Eqs. fl69|) and (|70|) from 
simulations at the smallest lattice size, L=20. This is for H=—0.75J and ksT=J. When 
the cutoff for the lifetime is kept constant at M=0, once the system is in the multi-droplet 
regime (for large L) the lifetime is independent of the system size (□). When the cutoff for 
the lifetime is not kept constant in M, but is rather constant in the number of overturned 
spins (here at L 2 — 200) for both the Monte Carlo (o) and extrapolated values (O) the 
lifetime depends on L. 
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Figure 14: Results for a three-dimensional 32 3 simple cubic Ising model using projective 
dynamics. This is for a temperature of /cb^=0.9026819J at \H\=1.3J and using a forcing 
speed of 5.0xl0 3 (for units see the text). These are plotted as a function of the total 
system magnetization. The average lifetime for this escape is (r)=7.3xl0 15 MCSS. (a) 
The growing and shrinking probabilities up to 1024 overturned spins, (b) A portion of 
(a) showing the growing and shrinking probabilities near the metastable state, (c) The 
residence time obtained using these growing and shrinking probabilities. 
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Figure 15: A portion of a lattice showing the arrangement of spins on a single processing 
element (PE) and its neighboring PEs. This is for £=6. The solid lines represent the 
boundaries of spins on a single processor, while the dashed lines isolate the boundary 
spins from the interior spins. 
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Figure 16: The update rate for the square-lattice Ising model (in units of a million 
Monte Carlo steps per second of wall-clock time) is shown as a function of the number 
of processing elements (PEs) of a Cray T3E, iVpE- The algorithm is implemented using a 
conservative discrete-event simulation approach. The filled squares are for the standard 
Monte Carlo algorithm, while the open squares are for the shielded n-fold way algorithm 
(in continuous time). The temperature is T=0.7T C and the applied field is |if |=0.2857J. 
Spins in blocks of £x£ are placed on each PE, with £=128. The size of the simulated 
lattice is L=£^Npe, so the largest lattice simulated has L=2560. The lines are guides for 
the eye. Note that a straight line fit would correspond to perfect scaling of the parallel 



algorithm. After Ref. [115 
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Figure 17: The average utilization, (u), is shown as a function of the number of processing 
elements (PEs) used. The parallelization algorithm uses a conservative discrete-event 



simulation approach ||97]. The filled squares correspond to the standard dynamic Monte 



Carlo simulation for the square-lattice Ising model shown in Fig. [Tjj, where the utilization 
plateaus at about 80%. Here each PE has an £x£ block of spins, with £=128. The 
average utilization for the n-fold way algorithm plateaus at about 50%. The short-dashed 
line corresponds to the worst-case utilization in the limit iVpe— >oo for a one-dimensional 
lattice and £=1 (one spin per PE). This is obtained from finite-size scaling for various 
lattice sizes ||117|| . The long-dashed line corresponds to the worst-case utilization in the 
limit iVpE— >oo for a two-dimensional square lattice and £=1 [ 119| , with results obtained 
by running large lattice simulations. 
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